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Abstract 

We develop some sufficient conditions for the stochastic ordering between first-passage times, 
in a fixed state, for two Markov chains. In particular, we focus attention on the so called skip- 
free Markov chains. For our purposes, we develop a special type of coupling. We also define a 
relation between two Markov chains, which can have a natural role in the comparison between 
the tail behaviors of the distributions of first-passage times. Finally, we present some examples 
dedicated to words' occurrences. 
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1 Introduction 

We consider a Markov chain X = {Xn}n=o,i,... on the state space E = ^ {0,1,..., /c} or 
E = Eqo = {0, 1, . . .}. We denote by Th the stopping times 

Th = mf{nen:Xn>h}, h = l,2,...,k. (1) 

Th is thus the random time needed to reach or exceed the level h. In particular we will consider 
transition matrices P = {Pi,j)ij(zE with the property 

Pi,j = if 1 < i + 1< j. (2) 

The literature devoted to this topic is very wide. See e.g. [lOl [8l [5] and references cited 
therein. Besides the theoretical interest, the analysis of the first-passage times for this class 
of Markov chains emerges in the applications of probability to different fields such as reliability, 
networks, biology, and so on. See also [Ij for some related discussion (Success Runs and Machine 
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Replacement). In particular chains of this type are encountered, in a fairly direct way, in the 
problem of first occurrences of words in random sequences of letters from an alphabet. In such 
a case we have E = Ek- For simplicity sake, from now on we will limit our attention on the 
finite cases E = Ej.. However most of our results can be appropriately extended to the infinite 
state-space case. Throughout the paper, we will denote by the class of transition matrices on 
the state space satisfying ([2]). With some abuse of notation we also say that a Markov chain 
X is in Tfc if its transition matrix is in T^. 

The problem of words' occurrence suggested some of our results and will be briefly recalled in 
the last section. A very large literature has been devoted to such a fleld, in different frameworks 
and from different points of view; typically attention has been concentrated on different aspects 
of the exact computation of E (T^) or of the probability distribution of T^. 

In this paper we rather consider, for pairs of Markov chains X and X on the same state space 
Ek-, stochastic orderings between the corresponding first-passage times and T^. The idea of 
studying stochastic ordering of first-passage times was already considered in the papers [5] and 
|10j . In particular, the paper by Irle and Gani, i.e. [10], presents some results in the same spirit 
of ours, in the context of detection of words. 

Different notions of stochastic orders might be considered for the N-valued random variables 
Tfc and Tk (see e.g. [IHIIT]); as natural ones in our context, we consider the usual stochastic order 
Tk :^st and the tail (or asymptotic) stochastic order, that is defined in terms of tail behavior 
of the distributions. A rather detailed analysis of the stochastic tail order has been offered in the 
recent work [Oj. 

In our results concerning the stochastic order, the assumption that X and X belong to will 
be specifically used. The proof of our results in such direction will be based on a coupling method 
that takes essentially into account the order structure of the state space Ek- More precisely, 
on a same probability space, we construct two Markov chains (sharing the laws of X and X, 
respectively) in such a way that they are "coupled" only in some instants when they visit the same 
states. When one of the two chains has a transition to a state "higher" than the other one, it 
stops and waits for the latter, which has an independent evolution in the meantime. A similar 
approach has been also used in [7]. We believe that such a type of (partial) coupling can have 
some more general interest and that it might be applied in different other problems related with 
stochastic comparisons. Furthermore we point out that this method of coupling turns out to be a 
constructive one and can work especially well in the proof of strict inequalities between expected 
values of interest. 

For the results concerning the asymptotic stochastic order, we use different methods of proofs. 
In such a frame, we need to define a suitable order relation between two stochastic matrices of 
the same size. The circumstance that such a relation is maintained under products will have a 
relevant role in our derivations. In this part of the paper the condition ([2]) will not be necessary. 

Our results may also be used to deal with the case of continuous time. Processes in continuous 
time with a property analogous to ([2]) have been called free of positive skips (see [11] ) . In |13j it 
has been shown that, under simple conditions, first-passage times for such processes have the New 
Better than Used property. It is simple to see that also in our case the first-passage times have 
the New Better that Used property in discrete time. 
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In the specific cases of chains related to word occurrences, the distributions of T^'s are gen- 
eralized geometric distributions (see e.g. [6]). From a distributional point of view, such discrete 
distributions may appear rather simple at a first glance. In particular, similarly to the geometrical 
ones, they are completely determined by their expected values. However they manifest several 
apparently paradoxical aspects (see in particular [2l[3l[l]). On the basis of our analysis one might 
show that some unexpected behavior emerge also in the comparison between inequalities of the 

type Tk :<st Tk and those of the type P (Tk < Tfc^ > P (fk < Tk^ . 

Several results in the literature concerning waiting times to words' occurrences have been based 
on the notion of leading number associated to a word. Such an analysis is, in a sense, alternative 
to the one based on Markov chains. As a main feature of this paper, we discuss that the two 
different approaches can be usefully compared and combined. 

The structure of the paper is as follows. In Section [2] we present our results concerning the 
stochastic order. Section [3] will be devoted to the case of the asymptotic stochastic order. In 
Section |4] we discuss the theme of waiting times to words' occurrences and present, on such a 
basis, some examples of applications and some remarks concerning with the results of the previous 
Sections [2] and [3l 



2 A class of Markov chains and stochastic comparisons between 
absorbing times 

First we recall the standard notation for the usual stochastic ordering between two real random 
variables. For X, y, X : $7 — )• M, y : 17' ^ M, we write X Y if the following two equivalent 
conditions hold 

a) P{X >t)>P{Y >t) for any t G M; 

b) E [g (X)] > E[g (Y)] for all increasing functions (7 : M — > M for which the expectations 
E [g (X)] , E [g {¥)] exist. 

We will use the same symbol :<st also to compare two probability distributions over R. 

Notice that the relation X ^stY does not require that X, Y are defined on a same probability 
space; however we recall the following important characterization (see e.g. [16]), that will be used 
in what follows: X ^stY ii and only if there exists a probability space {i^,J-,P) and two random 
variables Xrfi^^M, such that 

• X X,Y Y; 

• P{X >Y) = 1. 

In what follows we present several results in which we establish stochastic comparisons between 
first passage times of two different Markov chains in T^. 

We consider two Markov chains X and X, belonging to T^, with transition matrices P = 
{Pi,j)i,jeE, P = iPi,j)i,jGE, and initial distributions ttq = (vro(i))ig_B, vfo =Jjoii))ieE, respectively. 
We furthermore consider Th and Th where Th is defined for X in ([2]) and Th is the analogue for X. 
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In our first result we exploit the condition that the Markov chains belong to T^. The proof is 
based on the coupling method, namely on the characterization of ■<st given above. However, we 
do not require the existence of a coupling for two chains having their trajectories almost surely 
ordered. 

Theorem 1. Let P = {pij : i,j = 0, . . . , /c) and P = (pij : i,j = 0,...,k) be two transition 
matrices in T^. Assume that, for any i = 0, . . . ,k — 1, there exists m{i) such that 

i) i + m{i) < k; 

Moreover suppose that the initial measures are stochastically ordered tt >zst tt. Then 

Tk -<st Tk- (3) 

Proof. Let us fix a particular choice of m(l), . . . , m{k — 1) such that i) and ii) hold. Moreover, for 
future convenience, we fix m{k) = 1. We will use a coupling method and we will obtain the proof 

in a recTirsive way. 

On a same probability space (O, J^, P), we define a sequence of i.i.d. random variables U = 
{Un}neN and an independent array of i.i.d. random variables U = {?7fc,n}fcGN,nGN+- All these 
variables have uniform distribution on [0, 1]. 

By using U, we construct on (0,, J-", P) a homogeneous Markov chain X = (X„)„gif having tlic 
law given by the initial distribution tt = (ttq, . . . ,Trk) and transition matrix P = {pij)ijQE^- We 
will also construct, by using U and U, a homogeneous Markov chain X = (X„)„gH having the law 
given by the initial distribution tt = (ttq, . . . and transition matrix P = {pi,j)i,jeEk- We will 
prove that the stopping times and T^, corresponding to X and X respectively, are ordered in 
the sense that ^ 

for each u _ 

First, we define Xq and Xq with distribution tt and vf, respectively. 
We set 

i 

Xo{Uo) := inf {i < A; : ^ vr^ > C/q}, (4) 

1=0 

and analogously 

i 

Xo{Uo) := ini{i <k:^ni> Uo}. (5) 

1=0 

It is immediately seen that Xo{Uo) '-'^ vr and Xo(Uo) Furthermore, for each value u G [0, 1] 

Xo{u) > Xq{u), in view of the assumption tt >zst tt- 
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Letting /(O) = /(O; Uq) := Xq{Uo), and recalling the meaning of m(l), . . . , m{k — l),m{k) = 1, 
we recursively define, for n = 1,2, . . . 

s 

Xm{I(0))+...+m.{I{n-l)) = -^m(/(0))+...+m(/(n-l)) (f^O, • • • , Un) := inf{s < k : P^^n-l) /^^^ - ^"i' 

1=0 

(6) 

/(n) = I{n; Uq, ■ ■ ■ , Un) ■= -'^m(/(0))+...+m(/{n--l))- (7) 

We notice that, if for a given n, we obtain I{n) = k then also 

/(n + 1) = k and X„(/(o))+...+m(/(n-i))+i = ^• 

We claim that 

L 

n = ^m{I{r)), (8) 

r=l 

where L is the random index 

L := mi{l G N : /(/) = A;}. (9) 

We stipulate that the sum ^ is equal to zero if L = 0. It is clear that, when L = 0, ([8]) holds. 
Now we prove that also in the case L > the expression for given in ([5D holds true. In fact, at 
least the inequality < Ylr=i fn{I{r)) holds, since I(L) = Xj^l ^^j^^)) = k. 

We then want to show that for any t < Ylr=i one has Xt ^ k. Let us first consider 

the values = J2r=i with s = 1, . . . , L — 1. For these values, Xa^ = k would contradict 

the position Q. 

Let us then consider the discrete intervals of the form Bs = {a^ + 1, . . . ,as+i — 1}, with 
s G {1, . . . , L — 1} and such that + 1 < a^+i — 1. For a € Bs, it is impossible that Xa = k. In 
fact a + m{a) < k iov a < k — 1 and Xc — Xb < {c — b) for any c> b. 

We now proceed to construct T^. To this purpose we consider a sequence of independent 
Markov chains {Y('")}rgN- For any r = 0,1,..., the Markov chain = {yJ^''^} nGN will be such 
that Yq"^^ = with probability one and it will admit P as transition matrix. More precisely Y^^) 
is constructed in terms of {Ur^i, Ur^2, ■ ■ ■} as follows: for n = 1, 2, . . . 

/M(n - 1) := Y^^'MUr,!, Ur,n-l), (10) 

s 

YP{Ur,l, Ur,n) ■= mf{s < k : Y,Pnr)in-l),l > Ur,n}. (H) 

1=0 

As a function of Uq, Ui, , . . ., we now also define the sequence Y = {Yn}neN as follows: 

Yo = YoiUo) := XoiUo) (12) 

s 

YniUo, ...,Un):= inf{s < k : 5^p57j!5;'^^^ > U^}. (13) 

1=0 
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Notice that the random variables /(0),/(l), . . . appearing in r.h.s. of (jl3p have been defined in 
([7]). Furthermore we have, by construction, Yn{Uo, . . . , Un) < I{n; Uq, . . . , Un) in view of condition 
ii). The sequences Y and {Y('')}^ are stochastically independent. 
Let now, for r G N, 

ivf ) := inf{n € N : Y^''^ = Yr}, (14) 

N^"^^ := inf{n € N : fj^) = I{r)}. (15) 

We notice that, for any r = 0, 1, 2, . . ., n[^^ < N2^^ since the chain Y^''^ starts in zero, it increases 
at most of one unit at any step, and Yr < I{r). For any r € N, N^^^ and N2^^ are two stopping 
times with respect to the filtration {Tn^)neN where Jq*^^ = a{Yr,I{r)) and 

J"^*") = a(Yr,I{r),Ur,i, . . .,Ur,n) for any n = 1,2, . . . . 
Now we consider the random variables 



rin) _ ^in) 
n=0 71=0 

for r = 1, . . . , L and 



Zr := ^(iv(") - ivj")) + J]m(/(n)), (16) 



:= Yr. (17) 

Notice that, letting r = L in (|16p . one has 



Zl ■■= fZiNt^ - + E m(/(n)) = 5](<) - <)) + T,, (18) 



n=0 n=0 n=0 



~ ~ (r) 

Furthermore, by recalling definition = 5^ (t)- 

We now consider the following sequence of random variables: 



\(o)+i' • • - ^^(0) , ^1' • • • ' ^^(1) ' ^2, (19) 



obtained by gluing together the sections of trajectories 



licit OUiilC Ui LilC &C»^L1U110 J 

We now set 



Notice that some of the sections Y^^^^-^ , ■ ■ ■ , ^ (i) can be missing. This happens when Yr = Xr 
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Xzr+.. ■■= Y^^l,^^, i = l,..., nP - ivf (20) 

In view of the strong Markov property, the joint probability distribution of the random variables 
X's defined by (jl7p and (j20p coincides, by construction, with a finite dimensional distribution for 
a Markov chain with initial law tt and transition matrix P. The random variables X's have not 
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been defined for any time t € N. By Kolmogorov's existence theorem, we can consider however 
the entire chain Xo,Xi,X2, • • • by suitably adding variables at the missing times. From ([9]) and 
p^ .we have 

^T.+E^.oc^r'-M"') = ^- ^^^^ 

Furthermore, by repeating the same argument used above, we can also obtain 

Xi < k, (22) 
for / = 0, . . . , Tfc + Er=o(^i'^ - ^) - 1- Thus 



r=0 



therefore Tk >Tk, whence the stochastic comparison in ([3j) follows. □ 

A related result is the following one, which appeared as Theorem 4.1 in [TO]. Such a result 
gives a stronger conclusion with respect to Theorem [1] but under much stronger conditions. Its 
proof can be also obtained along the same line of Theorem [TJ 

Theorem 2. Let X and X belong to with transition matrices P = {pij)ij^E, P = {pi.j)i,jeE, 
and initial distributions ttq = (7ro(i))ig£;, ttq = {T^o{i))ieE, respectively . Under the conditions 

Pi,- hst Pi,- for each i = 0, . . . , /c - 1, (23) 

TTO hst ^0, (24) 

one has the stochastic comparison 

Th <st Th, for h = 1, . . . ,k, (25) 
where is defined for X in (OP and is the analogue for X. 

Remark 1. The method of proof of Theorem [7] can be also convenient for implementation in 
computer programs and it is based on the skip-free property of Markov chains. A similar method 
of coupling had been already implemented in with the aim to simulate such Markov chains and 
to analyze Theorem 4-1 in [Tffl. 

In our context, we also notice that such a method leads to efficient estimates of the difference 
between expected values of two different first-passage times. In fact, as can be easily proven, 
it is more accurate that one based on the separate estimates of the two expected values. This 
circumstance turns out to be useful in several situations of interest. 

For instance, in the comparison between waiting times to words' occurrences, where the expected 
values can be extremely large and their differences relatively small, the estimate of expected values 
might reveal numerically inaccurate if compared with direct estimation of the difference between 
them. 
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Remark 2. We also notice that Theorem\^ can he directly extended to the infinite case E = E^a. 
In such a frame, it can also he useful to analyze recurrence properties of Markov chains satisfying 



We present an example in which hypothesis of Theorem [T] is satisfied but the hypothesis of 
Theorem [2] fails. 

Example 1. Let us consider E = {0, 1,2,3} and the transition matrices: 



/ i+e i+e \ / i i \ 



P 



\ 



1 

2 






i 
i 
1 / 



p 



\ 



2 

1 - e 

1 

2 " " 2 

1 / 



i 





1 



(26) 



where e > 0. Hypothesis of Theorem\M is never satisfied for any positive e. Taking the product of 
the matrices we ohtain 



1 



P' 



\ 



4 

l+2<: 
4 





2 



li2. 



4 





l-2£ 

4 
1 
4 






\ 

1 

! 

2 

1 / 



P' 



( 



\ 



3-2e 

4 
1 

! 

4 





4 2 \ 

— - 

4^2 

1/ 



(27) 



Therefore we can take m(0) = 2 and m{l) = m(2) = 1 to verify the hypothesis of TheoremUl when 
e is small enough. Thus showing that Tj. ^st Tj. 

The following result appears, at a first glance, to be similar to Theorem [2j However it offers a 
much wider range of applications. In Section HI examples will be presented in the frame of word 
occurrences. Also the proof of this result can be obtained along the same line of Theorem [H and 
will then be omitted. 

Theorem 3. Given two transition matrices in the space T^, namely P = {pij : i,j = 0, . . . ,k) 
and P = {pij : i,j = 0,. . . ,k). Let the initial measures he tt = n = 6o (both the Markov chains 
start in zero almost surely). Suppose that there exists an integer m €[l,k — 1] such that 

(i) ^si Tqji, 

(a) for each i G [m. A; - 1] p^^i+i < pj^j+i and Pi^ + = 1. 

Then Ti Ti for i € [m, k]. 



3 Asymptotic stochastic comparisons 

In this section, we turn to compare the tail behaviors of two first-passage times and T^. The 
next result, in fact, aims to establishing an asymptotic form of stochastic comparison between 
and Tfc. From this point on in this section we can abandon the condition that the Markov chains 
of interest belong to the class T^. The following two definitions are now needed. 
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Definition 1. Given two random variables X and Y we write X :<a.st. Y if there exists to such 
that P{X > t) > P(Y > t), for each t > to- 

Definition 2. Let A = {aij : i,j = 1, . . . ,k) and A' = {a[ j : i, j = 1, . . . , k) be stochastic matrices 
of a given order k. We write A< A' if and only if 

a^, T^sta'j^., yi<j< k. (28) 

Remark 3. The relation < is stronger than (|23p and it is transitive. However it is not reflexive. 
In this respect we have the following fact: let X be an homogeneous Markov chain on the state 
space E = {0,1, . . . , k}, with transition matrix P. The relation P < P holds if and only if X is 
stochastically monotone. 

In what follows we consider that the state k is absorbing and we want to compare the asymptotic 
behavior of the absorbing time in the state k for two Markov chains. We also assume that the 
initial measure for all the chain is concentrated on the state zero. 

The previous two definitions are of interest in the present context in view of the following 
result. 

Theorem 4. If P" < P" for each n large enough then T^ :^a.st. T^. 

Proof. We want to check that -P(T^. > L) < P{T}^ > L) for L large enough. We remark, since the 
state k is absorbing, that the identity {T^ > L} = {Xl ^ A;} holds. Then Po(Tk > L) = 1 —p^^^- 
Similarly we obtain Po(^fc > L) = 1—p"^^. The thesis then follows from the inequality P^<P^ . □ 

In the following, we present some results concerning the condition that two transition matrices 
are such that < for n large enough. 

In the following result we give a probabilistic characterization of the relation <. 

Lemma 1. Let A, A' be stochastic matrices on the state space E = {0, . . . ,k}. A<A' if and only 
if a Markov chain {Zn)n=o,i with Zn = {Yn,Y^) on the state space E^ exists with the following 
properties: 

i) 0^n)n=o,i 'i'S a Markov chain with transition matrix A. (l'^)n=o,i ^-s « Markov chain with 
transition matrix A' . 

ii) P{Yi < Yl\Yo = i, Y^ = i') = 1, for i<i' £E. 
Proof. Assume A< A'. Set 

r r 

Yi{U) := inf{r < k : J] a,,/ > U}, Yl{U) := inf{r < k : ^a^/,z > U}, (29) 

1=0 1=0 

where U is uniform distributed over [0, 1]. Then, conditionally on Yq = i (resp. Yq = i'), Yi{U) 
(resp. Y({U)) has the law Oj^. (resp. a'^ .). Furthermore ii) holds in view of (j29p . 

Viceversa, if i) and ii) hold, then (128p follows by definition of stochastic ordering. □ 
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We now show that the relation < is maintained under products of transition matrices. 

Lemma 2. Let A, A' , B, B' he stochastic matrices of order k such that A < A' and B <B' then 
AB < A'B'. 

An elementary but lengthy procedure can be used to prove this result. We prefer to provide a 
synthetic proof based on probabilistic arguments. 

Proof. Let Ui and U2 be i.i.d. random variables uniformly distributed over [0, 1]. We consider the 
random variables defined by 

r r 

YiiUi) := inf{r < A: : a^^i > Ui}, Yi{Ui) := inf{r < A: : ^ a[,^i > Ui}, (30) 
/=o 1=0 

r r 

Y2{U2) := inf{r < A: : ^ by^i > U2}, Y^{U2) := inf{r < A; : ^ b'y, ^ > U2}. (31) 
1=0 1=0 

The sequence (i^)n=o,i,2 is a non-homogeneous Markov chain with transition matrix A for the first 
step and transition matrix B for the second step. Analogously for (l^)n=o,i,2 with A' and B' . Now 
define Xq = Yq, Xi = Y2, X'q = Yq and X[ = Y2. These two random variables {Xn)n=o,i form a 
Markov chain with transition matrix AB; also (X^)„=o,i is a Markov chain with transition matrix 
A'B'. The pair {Zn)n=o,i with Zn = {Xn, X'^) can be seen as a Markov chain on the state space E'^. 
In view of LemmalU we can conclude the proof by checking that P{Xi < X'i\Xq = i, X'q = i') = 1, 
for i <i' ^ E. In fact, we have 

P{Xi < [Xo = i,X'^ = i') = P{Y2 < Y^lYo = i, Y^ = i') = 



'-i') 



= Yl ^(^2 < Y^,Yi = ii,Yl = i',\Yo = iX 
h<i[ 

Notice that, in the last equality, we are allowed to extend the sum only on ii < ig, in view of pOl 
By the Markov property of Z„ we obtain 



(32) 



En<^', PiY2 < Yi,Y^ = i,,Y{ = i',\Yo = iX = 
= En<.; P(Y2 < Y^\Yi = ii,Yl = i',)P{Y^ = i^,Yl = i',\Yo = i^ = 0- 

We also have P(Y2 < ^2l^i ~ H,Y( = i'^) = 1, in view of Lemma [TJ Therefore the r.h.s. of ()32p 
becomes P{Yi < Y(\Yq = ijY^ = i'). The latter term is equal to 1 by (I30p and this concludes the 
proof. □ 

Remark 4. LemmalU also guarantees that ]Xi=i^i — Yli=i^i {^i)i=i,—,n o-nd (i?i)i=i,...,n are 
stochastic matrices such that Ai ^ Bi, for i = 1, . . . ,n. 

The next result has an immediate application to our problem. It in fact provides an (apparently 
weaker) condition sufficient for the hypothesis appearing in Theorem [H 
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Theorem 5. Let P and P be stochastic matrices of order k. Assume that there exist two coprime 
integers ui and n2 such that 

then 

< P" (33) 
for n > n(ni, 77-2) := inf{r : Vr' > r, r' = ani + 671-2 with a,b N}. 

Proof. For n > n{ni,n2) we can write, by definition of h{ni,n2), P"' = (P"-i)"(P"2^f'^ pn _ 
xiien das]) is readily obtained by RemarkU □ 

We notice that the conditions given in Theorem [5] can be encountered rather often. A simple 
sufficient condition will be presented in the following Theorem. To this purpose we need the 
following notation. 

Given a stochastic matrix P = (pij : i,j = 0, . . . ,k), such that pi^k < 1 for i = 0, . . . ,k — 1, 
denote by (^^.^P the matrix obtained from P by making k a taboo state: (^k)Pi,j = Vi,j/iX ~ Pi,k)- 
Let us denote by A(P) the spectral gap of P. 

Theorem 6. Let P and P be two stochastic matrices on the state space E = {0, . . . , A;}. Suppose 
that p]f_^k = 'Pk,k = 1 (^''^d that (jt)-P is regular. Assume furthermore that \{P) < A(P). Then there 
exists no such that P" < P", for n > uq. 

Proof. First we notice that A(P) is larger than zero. Therefore, from each state i € E, the Markov 
chain associated to P can reach the state fc G in a finite number of steps. 

In this respect we will more precisely prove that there exists C > such that the following 
inequality holds for any positive integer n and i, j G {0, . . . , /c — 1} 

pIJ < n'^Cil - A(P))". (34) 

Concerning the matrix P we will prove, on the other hand, that there exists c > such that for n 
large enough 

> c(l - A(P))^ (35) 
If (fM|) and (f35]l hold, we get, for n large enough, the inequalities 

I I 

E(n) ^ ~(n) 

i=o j=o 

for /, z, i G {0, . . . , A; — 1}. This guarantees P" < P" for n large enough and concludes the proof. 

In order to get the inequality in (I34p we can consider the Jordan representation P = A^^JA 
for the stochastic matrix P, so that we can write \{J^)i,j \ < n'^(l — A(P))", for i, j G {0, . . . , A; — 1}. 
By developing the products with A"^ and A~^ we obtain (I34p in view of the assumption A(P) > 0. 

In order to show (|35p we first notice that there is only one eigenvalue of modulus (1 — A(P)) 
as a consequence of the regularity of the transition matrix (k)L'- This is an easy consequence of 
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Perron- Frobenius theorem, see [15]. We will denote by /i such an eigenvalue, which is actually 
real (again as a consequence of Perron- Frobenius theorem). We start considering the case where 
A(P) > 0. 

In such a case, we use the Jordan representation of P. We explicitly write 



= E T.(^''klini,miA)m,k (36) 



/=0 m=0 



To fix the ideas we consider the case in which {J)k-i^k-i = ^ (the second highest eigenvalue). 
Furthermore we are allowed to limit attention only to indexes < i < k — 1. In fact, for i = k the 
term are zero for j = 0, . . . ,k — 1 and one for j = k. From (|36|) we obtain 

P^k = i^'\kAk,k + (A-i),,fc_i/."(>l),_i,fc + od/il"), (37) 
where (A~^)j ^ = 1 because A(P) > 0. In this respect we claim that the products (^~^)j^jt_iAfc_i 

(n) 

can not be all nulls. In fact, if this were the case, we would have that p-^ does not depend on 
which is absurd (we remember that the sum on the rows is equal to one for each n). Therefore 
there exists i, j G {0, 1, . . . , A; — 1} such that (A~^)~- jr._^A^_]^ j ^ ^ large enough. 



2^ >i,k-lt^ y^^lk-l,j 



and 



pfl<l-\{A-\,__,^,-{A)k-,,k. (39) 

Inequality (|38]l is guaranteed by the fact that the sum on the rows is equal to one. 

Now using the regularity of the Markov chain we obtain that for i, j G {0, 1, . . . , /c — 1} 

pSJ > c{A-\k-il^^{A)k-i,j (40) 

where c is a positive constant. If A(P) = means that the states {0, . . . , A; — 1} do not communicate 
with the state k therefore there exists an invariant measure vr = (vri, 7r2, . . . , vrfc_i, 0) with vTj > 
for i = 0, . . . , /c — 1 (it is a consequence of the regularity of (^^P ). Therefore (I35p is trivially 
satisfied. This ends the proof. 

□ 

Remark 5. As a consequence of Theorem\^we obtain, for a single Markov chain with transition 
matrix P such that (fc)P is regular, the large deviation equality lim„_^oo - Inp^^'^'* = ^ where ^ = 
l-A(P). 

Remark 6. With obvious meaning of notation, consider the following conditions: 

a) n < Ji; 

b) 3no : n > no, P" < P"; 
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c) Tk ■<a.st. Tk- 



By summarizing Theorem [7] and Theorem 0, we have the implications a) =^ h) and b) =^ c) . It is 
immediate to find examples to show that both the reverse implications fail. 

Concerning the interest of the relation < , we also notice that it can have the following advan- 
tages with respect to the spectral-gap analysis when studying asymptotic stochastic order: 

• computing products of matrices can be easier than computing eigenvalues; 

• if the entries of the matrices are all rational, one can perform computations (using matrices 
with integer entries) without any approximation; 

• one can use the same calculations to study the asymptotic and the usual stochastic orders. 

4 Applications and occurrences of words 

In this section we discuss some applications of the results of Section [2j As mentioned in the 
Introduction, our results can in particular be applied in the frame of words occurrences. Let 
An = {ai, . . . , a]\f} be the alphabet composed by the N letters oi, . . . , a^. An ordered sequence 
w = wiW2 ■ ■ ■ Wk, where each of the elements Wj is one of the letters taken from ^^r, is then seen 
as a word of length k on An- We consider the space A^ of all possible words of length k on An- 
Assume that, at any instant n = 1,2, . . ., a letter is drawn at random from the alphabet An- 
Drawings are supposed to be independent and uniformly distributed over An- We define the space 
= A^; for u = {uji,uj2, - - -) G ri, we refer to oon as the letter at time n G N. The probability 
measure on Q is then the product measure that, at any drawing, assigns probability 1/A^ to each 
letter of ^at: 

P{oJn = a) = —, ogAn, n G N. 

For any word w = wiW2 - - - w^, w G A^, we consider the stopping time 

Tw := inf{n > k\ujn~k+i = wi,..., ujn = Wk}, 

i.e. the random time until the first occurrence of w. 

This scheme gives also rise to an homogeneous Markov chain X = {Xn}ne'N with state space 
E = {0,1, . . . ,k}. The Markov chain X is defined as follows: 

i) Xo = 0. 

ii) For n > 1 and i G {1, . . . ,k A n}, one has = z if 

a. U}n-i+lUJn~i+2 - - - UJn = W1W2 - - - Wi 

b. uJn-h+i^n-h+2 • • • w„ 7^ W1W2 - - - Wh,yh = i + I,. . . ,k An; 

iii) One has X„ = if ujn-i-^-iuJn-i+2 ■ ■ - ^n 7^ wiW2 - - - Wi for all 1 < i < /c A n. 
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Under these positions, coincides with the time of first visit to the state k for the chain. 
For our purposes we sometimes denote by p(^) = (pj^^) the transition matrix of such a Markov 
chain associated to w G A% and denote by = {a G A : a £ w} the alphabet formed by all the 
distinct letters of w. The alphabet is then the minimal alphabet of w. 

Furthermore, for w = W1W2 . . .Wk, we denote by Sw the leading number associated to w. The 
latter is defined as the binary vector 

= (ew (1) , ffw (2) , . . . , ew (.k)) 

where each (u) is equal to or to 1, according to the following position: for u = 1,2, ... ,k 

e^{u) = l{wk-u+i = wi,. . . ,Wk = ■ ■■Wu}. 

Leading numbers have been introduced by J. Conway and have been repeatedly used in the ap- 
plied probability literature (see in particular jl2j . |14j [2]), to deal with the stochastic framework 
described above. In particular, the distribution of only depends on the leading number and, 
as a function of it, the mean value E(Tw) has the explicit expression E(Tw) = Ylt,=i N^^wiu). 

In what follows, we rather analyze stochastic comparisons between the times and T^/ of 
occurrences for two different words w and w' of the same length k. On this purpose we primarily 
apply the results of previous sections. We shall see furthermore that an analysis based on the 
leading numbers Ew and Sw' can usefully be combined with such results. 

Let then P4 = and = P^ ^ be the transition matrices corresponding to the two words 
w, w'. 

For several pairs w, w', it can happen that P4 and P^ satisfy a condition of the type ()23p . 
Theorem [2] then gives us a useful criterion to check :<st T^,/. 

The stochastic ordering :<st is a partial order on the distributions of the times T^. As a first 
application of Theorem [2] we now show that such a partial order does admit a maximal element. 

For a G An, let a be the word belonging to A% and containing all letters equal to a. 

Proposition 1. For any word w G and a G A%, we have '^st ^w- 
Proof. The transition probabilities for the chain associated to a are given by 

(a) _ J_ (5) _ 1 _ J_ 

^•1,1+1 ~ Pi,o — ^ ^' 
for < i < A; — 1. We then see that the proof is immediately obtained from Theorem [21 □ 

Under a simple condition, the following result shows that also a minimal element does exist 
w.r.t. :<st- For N > k, let w be the word aia2 ■ ■ ■ a^, made with the first letters of the alphabet. 

Proposition 2. Let N > k. For any word w G A% and for w G A%, we have '^st ^W- 
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Proof. If the leading number associated to the word w = wiW2 ■ ■ ■ Wk-iWk is (0, 0, ... , 0, 1), then 
there is nothing to prove because the distributions of Tw and Tw are equal. Suppose then that the 
leading number of the word w contains more than only one 1. In such a case w has a repetition 
of at least one letter and we can then suppose that a^v, say, is not contained among its letters, 
in view of the condition N > k. Let us consider the word w = wiW2 ■ ■ ■ Wk-idN- It is clear that 
the leading number associated with the word w is (0, 0, ... , 0, 1). Therefore the distribution of 
is the same as the one of T^. Hence, in order to show the stochastic comparison T^, we 

prove '^st ^w- The associated Markov chains are easy to analyze because for i = 0, . . . ,k — 2 
and I = 0, . . . ,k the transition probabilities verify 

(w) (w) 

Pi,i =ph ■ 

As far as the transitions from the state k — 1 are concerned, we notice that for the index / = 
max{/ < k : ew(0 = 1} > 1 we can write 

and 

(w) _ (w) 
Pk-l,l — Pk-lp 

for / 7^ 0, L Therefore 

(w) _ (w) 1 

Pk-1,0 — Pk-1,0 "I" jy- 

We then see that the proof is immediately obtained from Theorem [2j 



□ 

Remark 7. A same string w = wi'W2 ■ ■ ■ Wk can be seen as a word on different alphabets, and we 
must keep in mind which is the alphabet An from which the random letters uji,uj2, ■ ■ ■ are drawn. 
Normally, such an alphabet does not coincide with the minimal alphabet Aw The probability 
distribution of depends on w only through the leading number and it depends on the alphabet 
An only through its cardinality N. For brevity's sake, such dependence on N is omitted in our 
notation; however it cannot be neglected, generally. 

In applying Theorem [2] to word occurrences, the following proposition can be of interest. 

Proposition 3. Let condition (j23p hold for P4 and P^. Then condition (j23p also holds for 
and P'^ for any alphabet A D U ^w' • 

Proof. First we notice that ([23]) reads 

k k 

Y.P^,l > Y.p[l, (41) 

l=j 1=3 
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for i = 0, . . . , /c and j = 1, . . . ,k and that pi^i, p[ i depend on the alphabet A only through its 
cardinality A^. When the alphabet A is replaced by the alphabet A then each pi^i, with / > 0, is 
replaced by pi^iN/N where N denotes the cardinality of A. Then all the inequalities in (j4ip are 
maintained. 

□ 

Proposition[3]guarantees the following property: once we have proved the inequality T^i 
by checking the condition ()23p for a sampling alphabet, then not only :^st T^i holds for any 
other compatible alphabet, but also it stands still on the comparison ()23p . 

In some cases Theorem [3] can be used to compare two words w, w which cannot be compared 
by means of Theorem [2j An example follows. 

Example 2. Let w' = [A^A^B,A,A) and w = {A,B,B,B,A) he seen as words on an alphabet 
with N > 5. By letting m = 4 and by using Proposition\^one obtains that Tm ^st T^^. Furthermore 
condition (ii) of Theorem\^ is also satisfied. Then we obtain that <st T^. Notice that this 
example can be easily generalized by adding a same number v of letters A on the left and on the 
right of the two words and by adding a number fi of letters B in the center. In other terms we 
are saying that, by means of Theorem [21 and Proposition [H one can compare two words w' and w 
whose leading numbers have the form 

= 1, for i = 1, . . . ,h; and ew(0 = 0, for i = h + 1, . . . k — 1, 

^w'(^) = fo^ i = 1, . . . , /i + 1; and e^'{i) = 0, for i = h + 2, . . . k — 1, 
with N >k> 2(/i + 1). 

As noticed in the Remark[3]of the previous section, the condition P < P' between two stochastic 
matrices is stronger than ()23p . however in order to guarantee asymptotic comparisons, we only need 
pn ^ p/n^ n large enough. Actually, as we checked by means of several examples, the validity 

of the hypothesis of Theorem [5] often holds for pairs of words w, w' such that E(Tw) < E(Tw'). 

In this respect, we can conjecture that E(Tw) < E(Tw') implies Tk :<a.st. T'^. 
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